Curso: Métodos Avanzados en Minería de Datos

Métodos Exploratorios desde la consola de R (Clustering, Aprendizaje no Supervisado)

Análisis en Componentes Principales

Biplots

Ejemplo Estudiantes

## Standard deviations (1, .., p=5):
## [1] 1.70095552 1.27618589 0.58872409 0.35016062 0.09429419
## 
## Rotation (n x k) = (5 x 5):
##                    PC1         PC2         PC3         PC4        PC5
## Matematicas -0.5266440 -0.27049630  0.43820071 -0.26121779 -0.6238776
## Ciencias    -0.4249362 -0.50807221  0.04049491  0.67362724  0.3253895
## Espanol     -0.3591470  0.56208159  0.56227583 -0.07008647  0.4837473
## Historia    -0.3526975  0.58648985 -0.39418032  0.44664495 -0.4204335
## EdFisica     0.5373018  0.09374599  0.57862603  0.52305619 -0.3067941

Tipos de Gráficos

Por defecto (prcomp de stats)

Tipos de Gráficos

Usando factoextra

## **Results for the Principal Component Analysis (PCA)**
## The analysis was performed on 10 individuals, described by 5 variables
## *The results are available in the following objects:
## 
##    name               description                          
## 1  "$eig"             "eigenvalues"                        
## 2  "$var"             "results for the variables"          
## 3  "$var$coord"       "coord. for the variables"           
## 4  "$var$cor"         "correlations variables - dimensions"
## 5  "$var$cos2"        "cos2 for the variables"             
## 6  "$var$contrib"     "contributions of the variables"     
## 7  "$ind"             "results for the individuals"        
## 8  "$ind$coord"       "coord. for the individuals"         
## 9  "$ind$cos2"        "cos2 for the individuals"           
## 10 "$ind$contrib"     "contributions of the individuals"   
## 11 "$call"            "summary statistics"                 
## 12 "$call$centre"     "mean of the variables"              
## 13 "$call$ecart.type" "standard error of the variables"    
## 14 "$call$row.w"      "weights for the individuals"        
## 15 "$call$col.w"      "weights for the variables"
##         eigenvalue percentage of variance
## comp 1 2.893249673             57.8649935
## comp 2 1.628650425             32.5730085
## comp 3 0.346596049              6.9319210
## comp 4 0.122612460              2.4522492
## comp 5 0.008891393              0.1778279
##        cumulative percentage of variance
## comp 1                          57.86499
## comp 2                          90.43800
## comp 3                          97.36992
## comp 4                          99.82217
## comp 5                         100.00000
## $coord
##                  Dim.1      Dim.2       Dim.3       Dim.4       Dim.5
## Matematicas  0.8957980 -0.3452036  0.25797931 -0.09146818  0.05882803
## Ciencias     0.7227976 -0.6483946  0.02384033  0.23587773 -0.03068234
## Espanol      0.6108931  0.7173206  0.33102532 -0.02454152 -0.04561456
## Historia     0.5999227  0.7484701 -0.23206345  0.15639747  0.03964443
## EdFisica    -0.9139265  0.1196373  0.34065108  0.18315368  0.02892890
## 
## $cor
##                  Dim.1      Dim.2       Dim.3       Dim.4       Dim.5
## Matematicas  0.8957980 -0.3452036  0.25797931 -0.09146818  0.05882803
## Ciencias     0.7227976 -0.6483946  0.02384033  0.23587773 -0.03068234
## Espanol      0.6108931  0.7173206  0.33102532 -0.02454152 -0.04561456
## Historia     0.5999227  0.7484701 -0.23206345  0.15639747  0.03964443
## EdFisica    -0.9139265  0.1196373  0.34065108  0.18315368  0.02892890
## 
## $cos2
##                 Dim.1      Dim.2        Dim.3        Dim.4        Dim.5
## Matematicas 0.8024540 0.11916550 0.0665533242 0.0083664287 0.0034607374
## Ciencias    0.5224364 0.42041555 0.0005683612 0.0556383052 0.0009414059
## Espanol     0.3731904 0.51454884 0.1095777630 0.0006022863 0.0020806881
## Historia    0.3599073 0.56020745 0.0538534429 0.0244601695 0.0015716811
## EdFisica    0.8352616 0.01431309 0.1160431572 0.0335452699 0.0008368811
## 
## $contrib
##                Dim.1      Dim.2      Dim.3      Dim.4    Dim.5
## Matematicas 27.73539  7.3168250 19.2019858  6.8234735 38.92233
## Ciencias    18.05708 25.8137375  0.1639838 45.3773665 10.58783
## Espanol     12.89866 31.5935718 31.6154103  0.4912113 23.40115
## Historia    12.43955 34.3970346 15.5378121 19.9491712 17.67643
## EdFisica    28.86932  0.8788311 33.4808079 27.3587774  9.41226
## $coord
##              Dim.1      Dim.2       Dim.3       Dim.4        Dim.5
## Lucia   0.32306263  1.7725245  1.19880074 -0.05501532  0.003633384
## Pedro   0.66544057 -1.6387021  0.14547628 -0.02306468 -0.123377296
## Ines    1.00254705 -0.5156925  0.62888764  0.51644351  0.142875824
## Luis   -3.17209481 -0.2627820 -0.38196027  0.67777691 -0.062503554
## Andres -0.48886797  1.3654021 -0.83523570 -0.15579197  0.123367255
## Ana     1.70863322 -1.0217004 -0.12707707  0.06683295  0.025291503
## Carlos  0.06758577  1.4623364 -0.50624044 -0.11792847  0.013123980
## Jose    2.01185516 -1.2758646 -0.54215002 -0.19778670  0.017434170
## Sonia  -3.04203029 -1.2548807  0.44882861 -0.63999876  0.037884840
## Maria   0.92386867  1.3693593 -0.02932977 -0.07146746 -0.177730107
## 
## $cos2
##              Dim.1       Dim.2       Dim.3        Dim.4        Dim.5
## Lucia  0.022270827 0.670420670 0.306659839 0.0006458478 2.816992e-06
## Pedro  0.139905502 0.848430539 0.006686527 0.0001680781 4.809354e-03
## Ines   0.514468899 0.136122895 0.202439714 0.1365196756 1.044882e-02
## Luis   0.936851990 0.006429392 0.013583605 0.0427712757 3.637375e-04
## Andres 0.084139511 0.656353715 0.245603703 0.0085448999 5.358172e-03
## Ana    0.732686110 0.261979570 0.004052795 0.0011209894 1.605349e-04
## Carlos 0.001892733 0.886081139 0.106192189 0.0057625700 7.136907e-05
## Jose   0.673612108 0.270910359 0.048916504 0.0065104446 5.058468e-05
## Sonia  0.808829929 0.137636943 0.017607237 0.0358004434 1.254472e-04
## Maria  0.308554271 0.677869212 0.000310977 0.0018464085 1.141913e-02
## 
## $contrib
##              Dim.1      Dim.2       Dim.3       Dim.4       Dim.5
## Lucia   0.36073437 19.2910834 41.46392357  0.24684974  0.01484748
## Pedro   1.53049754 16.4881591  0.61060555  0.04338706 17.11987788
## Ines    3.47395038  1.6328779 11.41096846 21.75259335 22.95871968
## Luis   34.77814436  0.4239976  4.20932799 37.46613853  4.39379307
## Andres  0.82603273 11.4470414 20.12771563  1.97950024 17.11709152
## Ana    10.09047896  6.4094282  0.46591936  0.36428947  0.71941493
## Carlos  0.01578791 13.1300601  7.39418080  1.13423412  0.19371414
## Jose   13.98967133  9.9949649  8.48038057  3.19050613  0.34184774
## Sonia  31.98461714  9.6688984  5.81215853 33.40593699  1.61421395
## Maria   2.95008527 11.5134890  0.02481953  0.41656436 35.52647960
## 
## $dist
##    Lucia    Pedro     Ines     Luis   Andres      Ana   Carlos     Jose 
## 2.164804 1.779065 1.397736 3.277258 1.685356 1.996135 1.553497 2.451273 
##    Sonia    Maria 
## 3.382478 1.663200

Individuos y variables mal representados

##       1979       1980       1981       1982       1983       1984 
## 96.1000354 86.9743777 80.6523653 70.7456314 65.6042706 85.9034700 
##       1985       1986       1987       1988 
## 75.1710823 81.9688762  4.9898384  0.8100985

Interpretando los individuos mal representados

Plano y círculo 1-3

Por defecto (FactoMineR)

Se observa que en los años 1987 y 1988 las importaciones de Mérxico fueron fuertes en Costa Rica y Honduras.

Análisis en Componentes Principales (ACP) con Variables Categóricas y Numéricas mezcladas

Es completamente incorrecto hacer un ACP, una CJ o un k-medias con variables cualitativas simplemente convirtiendo las categorías a códigos, por ejemplo, cambiando Femenino por un 1 y Masculino por un 2 porque no tiene ningún el álgebra ni las fórmulas de distancia. Las opciones son:

ACP Usando variables “Dummy” (Códigos disyuntivos completos)

Usando el paquete dummies

dummies-1.5.6 provided by Decision Patterns
       Matematicas Ciencias Espanol Historia EdFisica Sexo  Provincia
Lucia          7.0      6.5     9.2      8.6      8.0    F Puntarenas
Pedro          7.5      9.4     7.3      7.0      7.0    M Puntarenas
Ines           7.6      9.2     8.0      8.0      7.5    F Puntarenas
Luis           5.0      6.5     6.5      7.0      9.0    M Puntarenas
Andres         6.0      6.0     7.8      8.9      7.3    M Puntarenas
Ana            7.8      9.6     7.7      8.0      6.5    F Puntarenas
Carlos         6.3      6.4     8.2      9.0      7.2    M Puntarenas
Jose           7.9      9.7     7.5      8.0      6.0    M Puntarenas
Sonia          6.0      6.0     6.5      5.5      8.7    F Puntarenas
Maria          6.8      7.2     8.7      9.0      7.0    F Puntarenas
       Conducta
Lucia         3
Pedro         2
Ines          2
Luis          1
Andres        2
Ana           3
Carlos        1
Jose          1
Sonia         2
Maria         3
'data.frame':   10 obs. of  8 variables:
 $ Matematicas: num  7 7.5 7.6 5 6 7.8 6.3 7.9 6 6.8
 $ Ciencias   : num  6.5 9.4 9.2 6.5 6 9.6 6.4 9.7 6 7.2
 $ Espanol    : num  9.2 7.3 8 6.5 7.8 7.7 8.2 7.5 6.5 8.7
 $ Historia   : num  8.6 7 8 7 8.9 8 9 8 5.5 9
 $ EdFisica   : num  8 7 7.5 9 7.3 6.5 7.2 6 8.7 7
 $ Sexo       : Factor w/ 2 levels "F","M": 1 2 1 2 2 1 2 2 1 1
 $ Provincia  : Factor w/ 1 level "Puntarenas": 1 1 1 1 1 1 1 1 1 1
 $ Conducta   : int  3 2 2 1 2 3 1 1 2 3
[1] 10  8
Warning in model.matrix.default(~x - 1, model.frame(~x - 1), contrasts =
FALSE): non-list contrasts argument ignored
'data.frame':   10 obs. of  8 variables:
 $ Matematicas: num  7 7.5 7.6 5 6 7.8 6.3 7.9 6 6.8
 $ Ciencias   : num  6.5 9.4 9.2 6.5 6 9.6 6.4 9.7 6 7.2
 $ Espanol    : num  9.2 7.3 8 6.5 7.8 7.7 8.2 7.5 6.5 8.7
 $ Historia   : num  8.6 7 8 7 8.9 8 9 8 5.5 9
 $ EdFisica   : num  8 7 7.5 9 7.3 6.5 7.2 6 8.7 7
 $ Sexo.F     : int  1 0 1 0 0 1 0 0 1 1
 $ Sexo.M     : int  0 1 0 1 1 0 1 1 0 0
 $ Conducta   : int  3 2 2 1 2 3 1 1 2 3
 - attr(*, "dummies")=List of 1
  ..$ Sexo: int  6 7
[1] 10  8
       Matematicas Ciencias Espanol Historia EdFisica Sexo.F Sexo.M
Lucia          7.0      6.5     9.2      8.6      8.0      1      0
Pedro          7.5      9.4     7.3      7.0      7.0      0      1
Ines           7.6      9.2     8.0      8.0      7.5      1      0
Luis           5.0      6.5     6.5      7.0      9.0      0      1
Andres         6.0      6.0     7.8      8.9      7.3      0      1
Ana            7.8      9.6     7.7      8.0      6.5      1      0
       Conducta
Lucia         3
Pedro         2
Ines          2
Luis          1
Andres        2
Ana           3